home *** CD-ROM | disk | FTP | other *** search
- Update Manual
-
- translator.library - Version 43.1
- Update from Version 42.4
-
-
- 3 July 1997
-
- M. L. Barlow
-
-
- 1. Status.
-
- Version 42.4/43.1 of the translator library is not in the public domain.
- Source is available for Version 42.4. The library and accent files are
- freely distributable provided no profit is made from them. Accent files may
- have additional or separate restrictions placed on them by their authors.
-
- 2. Introduction.
-
- This version of the Translator Library was developed from the source code
- posted to the Aminet by Francesco Devitt. As such it remains largely his
- work. This version adds new accent file rules to facilitate number
- expression and fixes three problems I have encountered with the original
- Translator42. Version 43.1 adds the placement of the Narrator escape code
- sequence after the end of the translated text output string to achieve
- compatibility with those programs that give the whole translation buffer to
- the Narrator. The basic changes from version 42.4 to 43.0 are as follows:
-
- (Enhanced Syntax for Accent Files)
-
- 2.1 Added Empty Match Condition.
-
- Extra text may be inserted into the output string based on the text
- pattern defined by the prefix and suffix rules alone. This allows for the
- insertion of "thousand" or "hundred" in number strings with a single
- statement. An empty match is indicated by [¶] or [¶@] in the match string
- (¶ is ALT-P). Only one empty match is allowed at any text location.
-
- 2.2 Added Suffix Text Induction Feature.
-
- Suffix pattern matching text characters may now be pulled into the
- bracket delimited text-replacement string. Text may be pulled in ahead
- {& replace-text } or behind {! replace-text} the replacement text.
- Multiple characters may be pulled in by repeating the & or ! characters.
- The whole suffix match will be pulled in if {&* or {!* is specified. This
- last feature can be used to convert $45,701 to: {45,701 dollars} with
- rule:
-
- %class numeric 0 1 2 3 4 5 6 7 8 9 \. \,
- [$](numeric+) = {&* dollars}.
-
- 2.3 Added Zero or One Match Condition.
-
- It is now possible to specify a zero or one match condition. This
- allows the specification of optional prefixes or suffixes that only occur
- once.
-
- (Problems Solved)
-
- 2.4 Fixed Word Separator Problem.
-
- Translator42 does not recognize the same set of word separators as the
- original Translator37. This causes unusual pronunciation when punctuation
- marks or numbers are combined with text. In Translator43 the %Separator
- statement of Translator42 has been replaced by an %Alphabet statement. All
- characters not in this Alphabet Set are treated as word-separators. The
- default alphabet does include the ISO-8859-1 international characters.
-
- 2.5 Fixed Buffer Overrun Problem.
-
- Translator42 may crash your system if the buffer that is provided by the
- program using this library is not large enough to handle the resulting
- text. The translator is supposed to stop short if this happens and report
- how much text it did translate. Translator37 does this. However,
- Translator42 can fail to notice the end of the buffer and continue on
- writing into unauthorized memory space. This problem is now fixed.
-
- 2.6 Fixed The False, In-line Text Command Problem.
-
- Translator42 allows accent and scope changes to be made by in-line text
- commands delimited by simple braces and backslashes. This can be a problem
- when reading general text that may contain these characters or ASCII art.
-
- Translator42 does allow this feature to be turned off using the
- Translator42 preference tool to delete these characters in the boxes
- provided. It should not be necessary to do this with translator43.
-
- Translator43 reduces the severity of this problem by requiring that a
- rubout character, 7F hex, precede each in-line text command in the text
- being read.
-
-
- 2.7 Added Assembly Code Modules.
-
- Several simple repetitive routines, including the built-in unsigned byte
- strchr() routine, have been replaced by hand optimized assembly modules for
- increased processing speed.
-
- (Problem Avoided)
-
- 2.8 Dropped External Language Reference Rule Capability.
-
- This feature only works in Translator42 if the user does not disable or
- change the definitions of the in-line language or scope changing rules.
- This rarely used feature was dropped due to the performance impact and
- complexity of the filter that would be required to prevent text induction
- of false scope codes.
-
- 3. Requirements.
-
- A complete installation of Translator42.4 or Translator43.0 is required.
- See the section 3 of Translator.man supplied with Translator42 for that
- installation procedure, if required. Upgrading to 43.0 or downgrading to
- 42.4 is not required for this patch.
-
- 4. Installation.
-
-
- 4.1. (Optional Precaution) Back-up your SYS: partition, or Libs:
- directory.
-
- 4.2. Install Translator42.4 if this has not been done. This step is _NOT_
- required if you have already upgraded to Translator43.0. This patch
- only works on fully installed versions of Translator42.4 or 43.0.
-
- 4.3. Unpack the Tran43pch archive to a convenient directory.
-
- 4.4 Stop and close out (exit/quit) all programs that might be using the
- translator.library. If possible, don't start any such programs
- before the installation.
-
- 4.5. Run the Installer Script by clicking on the Install icon in the
- unpacking directory. This patches the installed sub-type (v.33, v37,
- or 020) of Translator42.4 or Translator43.0 to the equivalent
- Translator43.1 sub-type. The previous translator.library will be
- renamed to translator42.4xlibrary or translator43.0xlibrary and a new
- translator.library will be patched in. If you are upgrading from
- 42.4 and have the Italiano.accent in your Locale:accents directory,
- this accent will be renamed to Italiano42.Xaccent and a new
- Italiano.accent will also be patched in. This new accent has only
- one line changed for compatibility with Translator43. If you elect
- to create a log file, this file will be created in the unpack
- directory you have selected.
-
- 4.6. If the old translator.library was resident in ram: do to prior use,
- it may be necessary to use "avail flush" or "flushlib
- translator.library remove" or reboot the system before the upgrade
- takes effect.
-
- 4.7. (Optional) copy the Update.man file to a directory of your choice for
- future reference.
-
- 4.8. (Optional) Unpack the Specialized Translator43 Accent files and copy
- them to Locale:accents.
-
-
- 5. New Accent Files.
-
- The new accent files will be uploaded independently. I am using an
- "Ax_(n)" format prefix on the archive name to group them together and
- assure that version information is not lost if the names are truncated to
- 8.3 format. Use a directory utility to copy these demo accent files to
- your Locale:Accents directory if you wish. Most of these accent files are
- experimental, as I only speak USA-English. I have chosen city names to
- indicate the experimental nature of these accents.
-
- 5.1 Berlin.Accent. (Ax_1Berlin.lha)
-
- An experimental German accent demonstrating some of the new features of
- this version and some special phoneme combinations to overcome the lack of
- the proper German CH sounds in Narrator 37.7. The name "Berlin" is chosen
- to distinguish it from the authentic deutsch.accent developed by native
- German speakers. It was developed from the deutsch.accent, version 0.1 By
- Stefan Zeiger and the rules stated in the Pronunciation chapter, pp
- 265-267, of "Der Anfang (Understanding and Using German)" by Harold von
- Hofe, 1958. This accent is optimized for use with Narrator 37.7 and
- requires Translator43.
-
- 5.2 Chaucer.Accent. (Ax_0Chaucer.lha)
-
- An experimental generalized Middle English accent. English before the
- "Great Vowel Shift" with trilled Rs and guttural gh sound. Based on the
- brief description of Middle English in "A History of the English Language"
- by Albert C. Baugh. This accent has been optimized with Narrator 37.7 and
- requires Translator43.
-
- 5.3 Paris.Accent. (Ax_0Paris.lha)
-
- A rather crude experimental stop-gap French accent developed from
- several English guides on French pronunciation. Optimized with Narrator
- 37.7. Requires Translator43.
-
- 5.4 !USA.Accent. (Ax_1USA.lha)
-
- This is an extensive USA American accent developed new from scratch. It
- is about 12 times the size of the standard "American accent." The goal is
- to approximate the USA Broadcast Standard Accent. The following features
- are included:
-
- a. Arabic and Roman numeral conversion
-
- b. Silent 'e' detection in many compound words
-
- c. British spelling recognition
-
- d. Resolution strategies for some common homographs (lead, live, read,
- wind), words with the same spelling but different pronunciations.
-
- e Recognition of many place and personal names
-
- f. Conversational, non-formal pronunciation
-
- The large size of this accent allows a much higher accuracy than the
- standard American accent, however 68000 based systems may experience a 2
- second delay per 80 character line of text and a 30 to 60 second initial
- loading delay while the accent is compiled at first use. The speech is
- quite snappy on a 68040 based system at my recommended 210 word per minute
- speaking rate setting.
-
- The basic reason for the large size of this file is that the spelling of
- English words has become more a matter of tradition than phonetics. The
- traditional spellings of most basic English words were established in the
- Middle English period based on a Latin model. Most of the changes in
- English pronunciation since that time are not reflected in the spelling.
- These changes include final e silencing, loss of the guttural gh sound,
- and the Great (Long-)Vowel Shift. Words imported or "borrowed" from other
- languages tend to retain their native spellings. The rules for spelling and
- pronouncing imported words from classical Latin and Greek have remained
- essentially the same. (This is why "machine" does not rhyme with "shine".)
- Through these and other processes, the pronunciation rules for English
- words have become quite complicated.
-
-
- 6. Translator Preferences Program and Other Utilities.
-
- See the Translator42 Translator.man for information on the Translator42
- utilities. For Translator43, the example how a new accent may be selected
- by in-line text should be modified to read as follows:
-
- \english{ Hello. Beastly hot weather this!
- Yes, hello. My name is {\maori Hone Ropata}
- and I am \maori{Maori.}}
-
- The character is the "rub-out" character, ASCII 127 decimal, that was
- added in Translator43 as an additional qualifier to reduce problems reading
- general text that may contain ASCII art. Do NOT try to insert this
- "rub-out" character in the preference boxes.
-
- * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
- * Note: The information in the following section is primarily for *
- * those who wish to create or modify accent files. Other users may *
- * ignore this section. *
- * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
-
- 7. Accent file Format.
-
- For the most part, accent files for Translator42 will be compatible with
- Translator43. The Italiano.accent must be changed because it has an
- embedded reference to the English accent that is no longer supported. A
- patch for a modified Italiano.accent is included. The only possible
- problems would be with files that use the defunct %separator directive,
- used [¶ in the match string or that used {! or {& in the phoneme string.
- See translator.man for a complete description of the basic format. Each
- line of the file may be one of the following types:
-
-
- 1. Blank lines are ignored.
-
- 2. Comment lines beginning with `#' are ignored.
-
- 3. Directives begin with `%'.
-
- 4. All other lines are pronunciation rules.
-
-
- 7.1. Directives. Directives in an accent file are introduced by a percent
- character (%) followed by the name of the directive and its arguments.
- Below is a summary of the directives in Translator43. The Translator42
- manual, translator.man, is referenced where no change applies.
-
-
- 7.1.1. Directive: stress.
-
- Syntax: %stress <N>
-
- Description: See Translator.man.
-
-
- 7.1.2. Directive: emphasis.
-
- Syntax: %emphasis <N>
-
- Description: See Translator.man.
-
- 7.1.3. Directive: class
-
- Syntax: %class <member> [ <member> ... ]
-
- Description: See Translator.man.
-
- 7.1.4. Directive: complain
-
- Syntax: %complain <level>
-
- Description: See Translator.man.
-
- 7.1.5 Directive: alphabet (new with version 43)
-
- Syntax: %alphabet <character list>
-
-
- Description: This command can be used to define the *characters*
- that constitute words. The default is approximately equivalent to the
- following:
-
- %alphabet aáàâãä bcd ð éèêë fgh iíìîï jklmnñ oóòôõö pqrs ß t þ uúùûü vwxyz
-
- Note that UNLIKE the class entries, only single characters are accepted,
- spaces are totally ignored as delimiters. These characters are used to
- build a random access lookup table that defines the alphabet status of each
- character. Each character in the list causes its table entry to be set to
- one.
-
- 7.1.6 Directive: separator (obsolete, not recognized)
-
-
- 7.2 New Context Rules. See 7.2 of translator.man for the basic context
- rules.
-
- 7.2.1 Background: General Sequence of Operation.
-
- The client program that calls the Translator Library provides a pointer
- to the text string to be translated, its length in bytes, a pointer to a
- buffer to hold the translated output, and its length in bytes. The input
- source string is copied to a new reference buffer, delimited on each end
- with nulls, and converted to upper-case.
-
- Then the source reference buffer is translated by a progressive, single
- pass process that searches character by character for applicable rules.
- Rules consist of:
-
- (1.) an optional prefix requirement string,
-
- (2.) a mandatory [match] string,
-
- (3.) an optional suffix requirement string, and
-
- (4) a mandatory = followed by phoneme replacement string, or {text
- replacement string}, or an empty space.
-
- Examples:
-
- LAB[OURA]TORY = RAH | OURA is converted to phonemes RAH.
- $T[A]K(vowel)$ = EY4 | A is converted to EY4.
- |
- $["WW\ II"]$ = { world war two } | WW 2 will be converted to string
- | "world war two" and then, that string
- | may be translated to something like
- | "WER4LD WAA4R TUW4" independently.
- |
- $MAK[E]$ = | Silent E. silenced.
-
- Each rule's [match] string is compared with the text in the reference
- buffer; then the prefix and the suffix requirements are tested. If the
- prefix and suffix requirements are satisfied, then the replacement rule is
- applied. This rule either provides output phoneme text directly (the
- normal case) or provides a replacement string that is decoded in isolation
- to create the phonemes for the matched text. The process then continues on
- the source reference buffer at the next unmatched character. When the end
- of the source reference buffer is reached, a normal null return is
- executed. If the end of the output buffer is reached first, the output
- buffer is closed off with a null at the end of the last fully translated
- word and the routine returns a negative number representing the number of
- fully translated characters (if less than -8).
-
- 7.2.2 Modified: Pattern Codes.
-
- The left and right contexts are strings which may contain pattern codes.
- These include:
-
- (<class>) Must match one member of a class
- (<class>+) Must match one or more members of a class
- (<class>;) Must match zero or only one member of a class <new>
- (<class>*) Must match zero or more members of a class
- (<class>~) Must not be a member of a class
- @ Must be and alphabet character <new>
- $ Must be a non-alphabet word separator <modified>
-
- 7.2.3 New: Empty Match.
-
- The match string may now contain the empty match indicator ¶ (ALT P). If
- the prefix and suffix condition match, then the specified phonemes or text
- characters are inserted at that point. An empty match does not, BY ITSELF,
- advance the translation pointer on the input reference buffer. Once an
- empty match has been found and executed; it, and all proceeding empty
- matches are disabled until the current reference buffer character has been
- processed. Empty matches may be hidden by a proceeding normal match, but
- they will not hide succeeding empty or normal matches at the same character
- position on the input reference buffer or current source string.
-
- 7.2.4 Background: Search Lists.
-
- As in Translator42, rules are placed in one of 27 lists, depending on the
- first character of the match string. Thus, when we encounter a letter `A'
- in the source text to be translated, we save time by only looking through
- the list of rules with match strings beginning with the letter `A'. The
- relative order of the rules in each list is the same as that of the whole
- accent file.
-
- 7.2.5 New: Rule List Cross-Posting.
-
- The rules for the empty match strings go in list zero for non-alphabetic
- character rules. On an empty match condition, the current source text
- character is equivalent to the first character of the rule's suffix pattern
- rather then the first character of the match string. Thus, if this
- character corresponds to one of the other 26 lists, the rule would not be
- found. To enable empty matches in these other lists, the syntax [¶@]
- (ALT-P Shift-2) has been added to cause an empty match rule to be
- cross-posted to all lists. Each cross-post stub references the entry in
- list zero. Empty match cross-posting is not automatic because the
- predominant usage of this function is with numbers, where it is not
- required. Cross-posting is only required if the first character of the
- required suffix pattern may be alphabetic.
-
- 7.2.6 NEW: Suffix Text Induction.
-
- Translator43 allows the induction of text following matched text into the
- text replacement string on text replacement rules. Text induction modes
- indicated are indicated by the first character after the leading brace. In
- these modes, a temporary string is created that combines the bracketed text
- from the rules replacement text string with a delimited number of suffix
- characters. Leading or text swap induction is indicated by {& ...} and
- trailing induction is indicated {!...}. For example, the rule:
-
- [2](digit)(digit~) = {& and twenty}
-
- will create a replacement string "4 and twenty" for 24 and the rule:
-
- [2](digit)(digit~) = {!twenty }
-
- will create a replacement string "twenty 4" for 24. This feature should
- only be used where the suffix pattern rule guarantees the nature of the
- induced characters. Text induction advances the current source pointer
- for each character induced even with an empty match.
-
-
- 7.2.7 New: Text Induction Syntax.
-
- The number of characters to be induced may be specified by repeating the
- text induction indicator. For example, the rules:
-
-
- #short number indicator
- %class Ñ 0 1 2 3 4 5 6 7 8 9
- #short number or number and comma separator
- %class Ç 0\, 1\, 2\, 3\, 4\, 5\, 6\, 7\, 8\, 9\, 0 1 2 3 4 5 6 7 8 9
-
- [¶](Ñ)(Ñ)(Ç) (Ñ)(Ñ)(Ñ) (Ñ~) = {&&& thousand}
-
-
- will create a replacement string "375 thousand" from 375699. It is also
- possible to induct the whole matching suffix by placing an `*' after the
- text induction specifier as in the following example:
-
-
- #general numeric class
- %class numeric \, \. 0 1 2 3 4 5 6 7 8 9
-
-
- [$](numeric+) = {&* dollars}
-
-
- where $1,235.23 would create a replacement string "1,235.23 dollars". Note
- that space, quotes, ¶ (ALT P), ), (, ], [,and \ are the only characters
- that must be escaped in a match string if used as literal characters.
-
- 7.2.8 CAUTION: Recursion Hazard.
-
- Accent file programmers should be aware that there is an increased risk
- of rule recursion in text induction rules, especially with empty pattern
- match and whole suffix induction rules. These text induction rules must be
- written to prevent the application of that SAME rule to its OWN replacement
- string. If the whole suffix is inducted on an empty match [¶], then there
- MUST be a prefix pattern requirement that CAN NOT be met by the text in the
- new replacement string.
-
- Examples:
-
- Bad Rule --
- $[¶](numeric+)$ = {!* number }
-
- This rule would cause the creation of a new string containing the text
- " number " and the class numeric text. The text "number" would be
- converted to something like "NAH4MBER" in the output buffer and respond to
- the class numeric text string by spawning an additional "number"-numeric
- string just like the previous string. This recursive spawning would
- continue until the maximum recursion limit is reached.
-
- Translator42/43 allow replacements nested 64 deep. If this limit is
- exceeded, the program aborts the current line of text.
-
- Better Rule --
-
- %class numeric 0 1 2 3 4 5 6 7 8 9 \. \,
- %class numberdone "number " 0 1 2 3 4 5 6 7 8 9 \. \,
-
- $(numberdone~)[¶](numeric+)$ = {!* number }
-
- The prefix requirement, (numberdone~), will prevent the recursive
- application of the rule. To be fully effective, the class numberdone or
- the prefix requirement should include provisions to anticipate the effects
- of all your other replacement rules on the numeric sequence.
-
- Good Rule --
-
- %class num 0 1 2 3 4 5 6 7 8 9
- %class tmark \,
-
- (num~)(num;)(num;)(num)(tmark;)[000](num~)={ thousand }
- (num~)(num;)(num;)(num)(tmark;)[00](num)(num~)={!* thousand and }
- (num~)(num;)(num;)(num)(tmark;)[0](num)(num)(num~)={!* thousand and }
-
- (num~)(num;)(num;)(num)(tmark;)[¶](num)(num)(num)(num~)={!* thousand }
-
- Note that this rule would convert a number string like
- [... 65,321 ...] to [... 65,{ thousand 321 } ...].
- ^ ^
- Recursion does not occur in this case because the required prefix does
- not exist in the new string { thousand 321 }. This example assumes that
- the "65," has been processed.
-
- 7.2.9 Eliminated: Language Changing Directives.
-
- Language changing in-line directives within the braces are no longer
- supported in Translator43. The previous example of this in the Translator42
- version of the Italiano accent file:
-
- [computer] = {\english computer} must be changed to:
-
- [computer] = KUMPYUW3TAH to produce the same effect.
-
- This is the only known instance where this feature was used.
-
- Direct insertion of the required phonemes eliminates potential problems
- resulting from the user redefining or disabling these directives and
- removes the requirement that the other language be present.
-
- The translator preferences tool may be used to determine the phonemes to
- be copied from a foreign language by the accent file programmer.
-
- 7.3 Phonemes.
-
- The phonemes listed in the original manual are reproduced here for easy
- reference.
-
- 7.3.1 Narrator Considerations. The last versions of the Narrator device I
- have are version V33.2 (5 Mar 1986), file size 23280 bytes, issued with OS
- 1.3 and version V37.7. (22 May 1991), 65760 bytes, issued with OS 2.04. The
- narrator programs function as programmable voice simulators and are capable
- of a wide range of effects. Narrator 33.2 simulates three vocal tract
- resonances or formants, the minimum required for good intelligible speech.
- Narrator 37.7 provides 5 formants for a more natural sounding voice. Also
- the frequencies and amplitudes of the three primary formants may be
- adjusted to change the quality of the voice. The original developer,
- SoftVoice Inc, is the only entity with the legal right to distribute or
- authorize distribution of that software.
-
-
- 7.3.2 Non-English Phonemes. The basic phonemes provided by the narrator
- appear to be intended for English only. Narrator 37.7 appears to be more
- English specific than narrator 33.2 as it replaces /C phoneme with CH.
- However, as each phoneme is blended with its surrounding phonemes, it may
- be possible to create vowel or consonant clusters that provide better
- approximations for non-English sounds. This is most effective at rapid
- speaking rates. The missing /C phoneme may be approximated by KZH, KZHQ,
- or KZH/H with narrator 37.7. In this case, the unvoiced surrounding
- consonants silence the ZH and the ZH muffles the impulse of the K sound.
-
- 7.3.3 Phoneme List. The following is the list of "ARPAbet" phonemes used
- by the Narrator device.
-
- Vowels English
- IY bEEt, EAt
- IH bIt, In
- EH bEt, End
- AE bAt, Ad
- AA bArgain, tArget
- AH tUg, bUg, bUt, Up
- AO shORE, wAR
- UH bOOk, sOOt
- ER bIRd, EArly
- OH bOrder (sounds like the letter 'O' when used by itself)
- AX About (never stressed)
- IX solId (never stressed)
-
-
- Dipthongs
- EY bAY, AId
- AY bIde, I
- OY bOY, OIl
- AW bOUnd, OWl
- OW bOAt, OWn
- UW brEW, bOOlean, pOO,
- crEW (except that it is a dipthong)
-
-
- Consonants
- R Red
- RX Red (This is not mentioned in RKRM:Devs)
- W Wag
- M Men
- NX siNG
- S Soon
- F Fed
- Z haS, Zoo
- V Very
- CH CHeck
- /H Hole
- B But
- D Dog
- K Keg, Copy
- L Long
- LX Long (This is not mentioned in RKRM:Devs)
- Y Yellow
- N No
- SH SHy
- TH THin
- ZH pleaSure
- DH THen
- WH WHen
- J JuDGE
- /C supposedly loCH, or (german) baCH, but really like CHurCH
-
- Narrator version 37.7 pronounces this sound like the
- German tsch, English ch as stated above. KZH, /HZH; or
- KZH/H, /HZH/H when followed by vowels; sound closer to
- the mark.
-
- P Put
- T Toy (except before IY when it is pronounced D)
- G Guest
-
-
- Others
- DX piTY (tongue flap)
- Q kitt(Q)en (glottal stop)
- QX (Silent vowel - can lenghten the previous vowel)
-
-
- Contractions
- UL AXL
- IL IXL
- UM AXM (almost equal )
- IM IXM
- UN AXN (almost equal )
- IN IXN
-
-
- Symbols
- Digits 1-9 Syllabic stress
- . Sentence final character
- ? Question sentence final character
- - Phrase delimiter
- , Clause delimiter
- () Put parentheses about noun phrases
-
- ## End of speech (undocumented)
-
-
- Translator
- ` Do not add stress marks to this word
- # Word boundard for the purposes of adding stress marks
-
-